What the app takes as input

  • A single cell dataset pre-clustered with either RaceID3 or Monocle2 and saved in a serialized R object.
  • The dataset can either be produced by the snakePipes scRNAseq workflow or by custom analysis as long as all the slots are populated.

What the app can do

  • Visualize the original clusters on a tsne map
  • Allow the user to specify the desired number of clusters, re-cluster the dataset and visualize the updated clusters on a tsne map
  • Extract cluster marker genes (up to 10 per cluster), summarize them in a table as well as visualize on a heatmap
  • Allow the user to provide 1 or more gene IDs to visualize the expression for on a tsne map
  • Calculate and list top10 most correlated genes to the genes provided by the user
  • Plot pairwise expression for genes provided by the user

How the app works

  • The user or uploads an RDS or an RData file not exceeding 100Mb.
  • A single cell object formatted and preprocessed with the respective package is loaded into memory.
  • The processing is done either using RaceID3 functions or using a combination of Monocle2 (clustering, visualization) and Seurat3 (marker gene extraction and visualization) functions. In case of RaceID3 outlier detection is performed but ignored for visualization, such that only original clusters are considered.

A note on processing speed

  • The more data is loaded, the longer (some of) the functions will take - the app is not meant to work with very large datasets! Try to stay < 5000 cells.
  • Data loading is rather slow, please allow up to a minute, depending how large your dataset is. Depending on the package of choice, re-clustering or extracting cluster markers might also be a slow calculation.

Dataset selection

Select R package that you’d like to conduct the analysis with from the “Select R package” pulldown list.

To upload a dataset, use the ‘Browse’ button in the “Choose file to upload” field.

Hint: the dataset must be fully processed and contain the initial clustering information!
Hint2: for RaceID, the dataset must be preprocessed with version 3 of the package. For Monocle, with version 2.

This vignette showcases the use of a dataset from a custom path.

Example1: RaceID

A published dataset stored under “/data/processing/scRNAseq_shiny_app_example_data/GSE81076_raceid.workspaceR/sc.minT1000.RData” will be analyzed. See Grün D, Muraro MJ, Boisset JC, Wiebrands K et al. De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 2016 Aug 4;19(2):266-277 for the original publication.

## Warning: package 'captioner' was built under R version 3.5.1

Select RaceID3 as analysis R package. Upload the dataset (wait until complete) and click on ‘Select dataset’ (Figure 1).

Figure  1: R package selection and dataset upload

Figure 1: R package selection and dataset upload

After some lag, the head of the normalized data appears in the “Input Data” tab (Figure 2). You can also check the dimensions of your matrix and the summary of the TPC (transcript per cell) distribution in the corresponding boxes.

Figure  2: Head of normalized counts and data summary

Figure 2: Head of normalized counts and data summary

In the “tSNE map and clustering” tab, a plot of the within-cluster dispersion as a function of cluster number will appear in the “Metrics for cluster number selection” box (Figure 3). Silhoutte plot illustrating cluster assignment quality as well a cluster membership tsne plot for the preselected number of clusters are displayed for the loaded dataset. Use this information to guide your cluster number choice as described in the package vignette.
The dataset was originally clustered into 6 clusters.

Figure  3: Cluster quality metrics for loaded dataset

Figure 3: Cluster quality metrics for loaded dataset

You decide to change the number of clusters to e.g. 3. Update the value on the ruler and click on ‘Update cluster plots’. This initiates re-clustering (Figure 4), and after a waiting time, the updated tsne and silhouette plots replace the old plots (Figure 5).

Figure  4: Update cluster number choice

Figure 4: Update cluster number choice

Figure  5: Plots for updated cluster number

Figure 5: Plots for updated cluster number

To obtain markers (by default: 2) for each cluster, click on ‘Get marker genes’ in the bottom half of the page (Figure 6). After a (rather long) while, a table with top markers as well as a heatmap corresponding to it appears (Figure 7).

Figure  6: Request top marker genes

Figure 6: Request top marker genes

Figure  7: Top marker genes result

Figure 7: Top marker genes result

To increase the number of markers displayed in the table and on the heatmap, move the ruler above the table. The two outputs will be updated (Figure 8).

Figure  8: Update the number of marker genes displayed

Figure 8: Update the number of marker genes displayed

You can download the marker table, use the ‘Download table’ button (Figure 9).

Figure  9: Download cluster marker table

Figure 9: Download cluster marker table

In the “Marker Gene Visualization” tab, you may plot expression of selected genes, as long as they are expressed in at least 1 cell in the dataset. To select a gene, copy one of the top markers into the “GeneID” field in the box and click on ‘Select genes’ (Figure 10).

Figure  10: Select gene IDs for visualization

Figure 10: Select gene IDs for visualization

Check that the gene(s) is(are) expressed in the ‘Genes used’ field (Figure 10).

Modify plot title and expression scale if needed, and click on ‘Plot tsne map’ to visualise gene expression for that gene(s) (Figure 11).

Figure  11: Tsne map with marker gene expression

Figure 11: Tsne map with marker gene expression

In the “Correlation Analyses” tab, you may query your dataset for the genes most correlated to your genes of interest and obtain pairwise gene expression plot. Again, enter a gene ID in the side box and click on “Select genes” button in this tab (Figure 12).

Figure  12: Select gene IDs for correlation analysis

Figure 12: Select gene IDs for correlation analysis

A violin plot of the pearson correlation calculated for log2-transformed counts will appear, alongside a list of top10 genes with the highest absolute correlation to the selected genes (Figure 13).

Figure  13: Display top correlated genes

Figure 13: Display top correlated genes

To plot pairwise correlation for selected genes, enter gene IDs into the boxes collecting information for X and Y axes in the bottom half of the page, adjust the plot title if necessary, and click on the “Plot expression” button (Figure 14).

Figure  14: Select gene IDs for pairwise expression plot

Figure 14: Select gene IDs for pairwise expression plot

Pairwise plot of normalized counts will appear (Figure 15).

Example2: Monocle

A published dataset stored under “/data/processing/scRNAseq_shiny_app_example_data/GSE81076_monocle.workspaceR/minT5000.mono.set.RData” will be analyzed. See Grün D, Muraro MJ, Boisset JC, Wiebrands K et al. De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 2016 Aug 4;19(2):266-277 for the original publication.

Select Monocle as analysis package. Upload dataset (wait till complete) and click on ‘Select dataset’ (Figure 16).

Figure  16: R package selection and dataset upload

Figure 16: R package selection and dataset upload

After some lag, the head of the normalized data appears in the “Input Data” tab (Figure 17). You can also check the dimensions of your matrix and the summary of the TPC (transcript per cell) distribution in the corresponding boxes.

Figure  17: Head of normalized counts and dataset summary

Figure 17: Head of normalized counts and dataset summary

In the “tSNE map and clustering” tab, a plot of delta (distance) versus rho (density) will appear in the “Metrics for cluster number selection” box (Figure 18). Silhoutte plot illustrating cluster assignment quality as well a cluster membership tsne plot for the preselected number of clusters are displayed for the loaded dataset. Use this information to guide your cluster number choice as described in the package vignette.
The dataset was originally clustered into 17 clusters (Figure 18).

Figure  18: Cluster quality metrics for loaded dataset

Figure 18: Cluster quality metrics for loaded dataset

You decide to change the number of clusters to e.g. 3. Update the value on the ruler and click on ‘Update cluster plots’. This initiates re-clustering (Figure 19), and after a waiting time, the updated tsne and silhouette plots replace the old plots (Figure 20).

Figure  19: Update cluster number choice

Figure 19: Update cluster number choice

Figure  20: Plots for updated cluster number

Figure 20: Plots for updated cluster number

To obtain markers (by default: 2) for each cluster, click on ‘Get marker genes’ in the bottom half of the page (Figure 21). After a (rather long) while, a table with top markers as well as a heatmap corresponding to it appears (Figure 22). This calculation is done using the Bioconductor scRNAseq analysis package ‘Seurat’.

Figure  21: Request top marker genes

Figure 21: Request top marker genes

Figure  22: Top marker genes result

Figure 22: Top marker genes result

To increase the number of markers displayed in the table and on the heatmap, move the ruler above the table. The two outputs will be updated (Figure 23).

Figure  23: Update the number of marker genes displayed

Figure 23: Update the number of marker genes displayed

You can download the marker table, use the ‘Download table’ button (Figure 24).

Figure  24: Download cluster marker table

Figure 24: Download cluster marker table

In the “Marker Gene Visualization” tab, you may plot expression of selected genes, as long as they are expressed in at least 1 cell in the dataset. To select a gene, copy one of the top markers into the “GeneID” field in the box and click on ‘Select genes’ (Figure 25).

Figure  25: Select gene IDs for visualization

Figure 25: Select gene IDs for visualization

Check that the gene(s) is(are) expressed in the ‘Genes used’ field (Figure 25).

Modify plot title and expression scale if needed, and click on ‘Plot tsne map’ to visualise gene expression for that gene(s) (Figure 26).

Figure  26: Tsne map with marker gene expression

Figure 26: Tsne map with marker gene expression

In the “Correlation Analyses” tab, you may query your dataset for the genes most correlated to your genes of interest and obtain pairwise gene expression plot. Again, enter a gene ID in the side box and click on “Select genes” button in this tab (Figure 27).

Figure  27: Select gene IDs for correlation analysis

Figure 27: Select gene IDs for correlation analysis

A violin plot of the pearson correlation calculated for log2-transformed counts will appear, alongside a list of top10 genes with the highest absolute correlation to the selected genes (Figure 28).

Figure  28: Display top correlated genes

Figure 28: Display top correlated genes

To plot pairwise correlation for selected genes, enter gene IDs into the boxes collecting information for X and Y axes in the bottom half of the page, adjust the plot title if necessary, and click on the “Plot expression” button (Figure 29).

Figure  29: Select gene IDs for pairwise expression plot

Figure 29: Select gene IDs for pairwise expression plot

Pairwise plot of normalized counts will appear (Figure 30).

General: Documenting your analysis

To keep trace of the parameters you used to generate your plots, it is recommended that you code them either into the plot titles (customizable by the user) or into the file names under which you save your plots.

To keep trace of the R and R packages versions, you might want to inspect the ‘sessionInfo’ tab. This contains the output of the sessionInfo() R command (Figure 31). At the bottom of the page, two buttons are available (Figure 32). Click on ‘Download session info’ or ‘Download your data’ to save the respective file on your computer.

Figure  31: Session Info tab

Figure 31: Session Info tab

Figure  32: Download documentation and modified dataset

Figure 32: Download documentation and modified dataset

Lastly, the code behind the app can be retrieved under “https://github.com/maxplanck-ie/scRNAseq_shiny_app” for the given version of the app. The latter you can read at the bottom of the side bar (Figure 33).

Figure  33: App version

Figure 33: App version